What is clained is: 

1 1 . A method of smoothing fundamental frequency discontinuities at boundaries of 

2 concatenated speech segments, each speech segment characterized by a segment fundamental 

3 frequency contour and including two or more fi-ames, comprising: 

4 determining, for each speech segment, a beginning fundamental frequency value and an 

5 ending fundamental frequency value; 

6 adjusting the fundamental frequency contour of each of the speech segments according to 

7 a predetermined function calculated for each particular speech segment, wherein parameters 

8 characterizing each predetermined function are selected according to the beginning fundamental 

9 frequency value and the ending fundamental frequency value of the corresponding speech 
10 segment. 

12. A method according to claim 1, wherein the predetermined function adjusts a slope 
2 associated with the speech segment. 

13, A method according to claim 1, wherein the predetermined function adjusts an offset 
2 associated with the speech segment. 

1 4. A method according to claim 1 , wherein the predetermined function includes a linear 

2 function. 

1 5. A method according to claim 1 , wherein the predetermined function calculated for each 

2 particular speech segment is dependent upon a length associated with the speech segment, such 

3 that the predetermined function adjusts longer segments more than shorter segments. 
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16. A method according to claim 1, further including determining, for each speech segment 

2 one or more parameters selected from: (i) a total duration of the segment; (ii) a total duration of 

3 all voiced regions of the segment; (iii) a average value of the fundamental frequency contour 

4 over all voiced regions of the segment; (iv) a median value of the fundamental frequency contour 

5 over all voiced regions of the segment; and (v) a standard deviation of the fundamental 

6 frequency contour over the whole segment. 

17. A method according to claim 6, further including setting the determined median value of 

2 the fundamental frequency contour over all voiced regions of the segment to the average value of 

3 the fundamental frequency contour over all voiced regions of the segment if a number of 

4 fundamental frequency samples in the speech segment is less than a predetermined value. 

18. A method according to claim 1, further including examining a predetermined number of 

2 frames from a beginning point of each speech segment, and setting the beginning fundamental 

3 frequency value to a fundamental frequency value of the first frame if all fundamental frequency 

4 values of the predetermined nimiber of frames from the beginning point of the speech segment 

5 are within a predetermined range. 

19. A method according claim 1 , further including examining a predetermined number of 

2 frames from an ending point of each speech segment, and setting the ending fundamental 

3 frequency value to a fundamental frequency value of the last frame if all fundamental frequency 

4 values of the predetermined number of frames from the ending point of the speech segment are 

5 within a predetermined range. 

1 10. A method according to claim 1, further including setting the beginning fundamental 

2 frequency and the ending fundamental frequency of unvoiced speech segments to a value 

3 substantially equal to a median value of the fundamental frequency contour over all voiced 

4 regions of a preceding voiced segment. 
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1 11. A method according to claim 1, further including calculating, for each pair of adjacent 

2 speech segments n and n+1 one or more of: (i) a first ratio of the n^ ending fundamental 

3 frequency value to the n+l^*" beginning fundamental frequency value; and (ii) a second ratio 

4 being the inverse of the first ratio; and adjusting the n'** ending fundamental frequency value and 

5 the n+1* beginning fundamental frequency value only if the first ratio and/or the second ratio are 

6 less than a predetermined ratio threshold. 

1 12. A method according to claim 1, further including calculating the function for each 

2 individual speech segment according to a coupled spring model. 

1 13. A method according to claim 12, further including implementing the coupled spring 

2 model such that a first spring component couples the beginning fundamental frequency value to 

3 an anchor component, a second spring component couples the ending fundamental frequency 

4 value to the anchor component, and a third spring component couples the beginning fundamental 

5 frequency value to the ending fundamental frequency value. 

1 14. A method according to claim 13, further including associating a spring constant with the 

2 first spring and the second spring such that the spring constant is proportional to a duration of 

3 voicing in the associated speech segment. 

1 15. A method according to claim 1 3, further including associating a spring constant with the 

2 third spring such that the third spring models a non-linear restoring force that resists a change in 

3 slope of the segment fundamental frequency contour. 

1 16. A method according to claim 12, further including forming a set of simultaneous 

2 equations corresponding to the coupled spring models associated with all of the concatenated 

3 speech segments, and solving the set of simultaneous equations to produce the parameters 

4 characterizing each linear function associated with one of the speech segments. 
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1 17. A method according to claim 16, further including solving the set of simultaneous 

2 equations through an iterative algorithm based on Newton's method of finding zeros of a 

3 function. 

1 18. A system for smoothing fundamental frequency discontinuities at boundaries of 

2 concatenated speech segments, each speech segment characterized by a segment fundamental 

3 frequency contour and including two or more frames, comprising: 

4 a unit characterization processor for receiving the speech segments and characterizing 

5 each segment with respect to a beginning fundamental frequency and an ending fundamental 

6 frequency; 

7 a fundamental frequency adjustment processor for receiving the speech segments, the 

8 beginning fundamental frequency and ending fundamental frequency, and for adjusting the 

9 fundamental frequency contour of each of the speech segments according to a predetermined 

10 function calculated for each particular speech segment, wherein parameters characterizing each 

1 1 predetermined function are selected according to the beginning fundamental frequency value and 

12 the ending fundamental frequency value of the corresponding speech segment. 

1 19. A system according to claim 18, wherein the predetermined function adjusts a slope 

2 associated with the speech segment. 

1 20. A system according to claim 1 8, wherein the predetermined function adjusts an offset 

2 associated with the speech segment. 

1 21. A system according to claim 1 8, wherein the predetennined function includes a linear 

2 function. 

1 22. A system according to claim 1 8, wherein the predetermined function calculated for each 

2 particular speech segment is dependent upon a length associated with the speech segment, such 

3 that the predetermined function adjusts longer segments more than shorter segments. 
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1 23. A system according to claim 18, wherein the unit characterization processor determines, 

2 for each speech segment one or more of: (i) a total duration of the segment; (ii) a total duration 

3 of all voiced regions of the segment; (iii) an average value of the fundamental frequency contour 

4 over all voiced regions of the segment; (iv) a median value of the fundamental frequency contour 

5 over all voiced regions of the segment; and (v) a standard deviation of the fundamental 

6 frequency contour over the v^hole segment. 

1 24. A system according to claim 23, wherein the unit characterization processor sets the 

2 determined median value of the fundamental frequency contour over all voiced regions of the 

3 segment to the average value of the fundamental frequency contour over all voiced regions of the 

4 segment if a number of fundamental frequency samples in the speech segment is less than a 

5 predetermined value. 

1 25. A system according to claim 1 8, wherein the unit characterization processor examines a 

2 predetermined number of frames from a beginning point of each speech segment, and sets the 

3 beginning fundamental frequency value to a fimdamental frequency value of the first frame if all 

4 fimdamental frequency values of the predetermined number of frames from the beginning point 

5 of the speech segment are within a predetermined range. 

1 26. A system according to claim 1 8, wherein the unit characterization processor examines a 

2 predetermined number of frames from a ending point of each speech segment, and sets the 

3 ending fundamental frequency value to a fundamental frequency value of the last frame if all 

4 fundamental frequency values of the predetermined number of frames from the ending point of 

5 the speech segment are within a predetermined range. 

1 27. A system according to claim 18, wherein the unit characterization processor sets the 

2 beginning fundamental frequency and the ending fundamental frequency of unvoiced speech 

3 segments to a value substantially equal to a median value of the fundamental frequency contour 

4 over all voiced regions of a preceding voiced segment. 

27 

LND99 269940^1 .0637 11 .001 8 



1 28. A system according to claim 1 8, wherein the unit characterization processor calculates, 

2 for each pair of adjacent speech segments n and n+1 one or more of: (i) a first ratio of the n*** 

3 ending fundamental frequency value to the n+1*** beginning fundamental frequency value; and 

4 (ii) a second ratio being the inverse of the first ratio, and adjusts the n* ending fundamental 

5 frequency value and the n+1*^ beginning fundamental frequency value only if the first ratio 

6 and/or the second ratio are less than a predetermined ratio threshold. 

1 29. A system according to claim 18, wherein the fundamental frequency adjustment 

2 processor calculates the linear function for each individual speech segment according to a 

3 coupled spring model. 

1 30. A system according to claim 29, wherein the fundamental frequency adjustment 

2 processor implements the coupled spring model such that a first spring component couples the 

3 beginning fundamental frequency value to an anchor component, a second spring component 

4 couples the ending fundamental frequency value to the anchor component, and a third spring 

5 component couples the beginning fundamental frequency value to the ending fundamental 

6 frequency value. 

1 31. A system according to claim 30, wherein the fundamental frequency adjustment 

2 processor associates a spring constant with the first spring and the second spring such that the 

3 spring constant is proportional to a duration of voicing in the associated speech segment. 

1 32. A system according to claim 30, wherein the fundamental frequency adjustment 

2 processor associates a spring constant with the third spring such that the third spring models a 

3 non-linear restoring force that resists a change in slope of the segment fundamental frequency 

4 contour. 

1 33. A system according to claim 29, wherein the fundamental frequency adjustment 

2 processor forms a set of simultaneous equations corresponding to the coupled spring models 

3 associated with all of the concatenated speech segments, and solves the set of simultaneous 
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4 equations to produce the parameters characterizing each linear function associated with one of 

5 the speech segments. 

1 34. A system according to claim 33, wherein the fundamental frequency adjustment 

2 processor solves the set of simultaneous equations through an iterative algorithm based on 

3 Newton's method of finding zeros of a function, 
4 

36 A method of smoothing fundamental frequency discontinuities at boundaries of 

5 concatenated speech segments, each speech segment characterized by a segment fundamental 

6 frequency contour and including two or more frames, comprising: 

7 adjusting the fundamental frequency contour of each speech segment according to a 

8 predetermined function calculated for each particular speech segment, wherein the 

9 predetermined function is dependent upon a length associated with the speech segment, such that 
10 the predetermined function adjusts longer segments more than shorter segments. 

1 37. A system for smoothing fundamental frequency discontinuities at boundaries of 

2 concatenated speech segments, each speech segment characterized by a segment fundamental 

3 frequency contour and including two or more frames, comprising: 

4 a fundamental frequency adjustment processor for adjusting the fundamental frequency 

5 contour of each speech segment according to a predetermined function calculated for each 

6 particular speech segment, wherein the predetermined function is dependent upon a length 

7 associated with the speech segment, such that the predetermined function adjusts longer 

8 segments more than shorter segments. 
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